Cultural Tourism Route Optimization
- Authored by: Uvini Wijesinghe
- Duration: 10 Weeks
- Level: Intermediate
- Pre-requisite Skills: Python
User Story
- Title: Optimised Cultural Tourism Routes in Melbourne
- As a: Tourism Planner/City Developer
- I want to: Integrate data from cultural landmarks, transport infrastructure (City Circle tram stops and Melbourne Visitor Shuttle bus stops), and pedestrian movement patterns to create optimised cultural tourism routes.
- So that: Visitors can experience a diverse range of cultural sites efficiently, while being guided through high-traffic pedestrian zones and accessible transport hubs to maximise engagement with public artworks, fountains, and monuments.
Acceptance Criteria:
All relevant public memorials, sculptures, artworks, fountains, monuments, and landmarks in Melbourne must be identified, mapped, and included in the dataset.
Data for City Circle tram stops, Melbourne Visitor Shuttle bus stops, and pedestrian pathways must be included to ensure routes are accessible via public transport.
High-footfall areas must be identified through pedestrian counting data to help determine the most popular areas and to adjust routes accordingly to optimise visitor engagement.
Optimised routes should guide visitors through high-interest cultural sites while ensuring accessibility to transport hubs and high pedestrian traffic zones.
Routes should cover the highest number of cultural landmarks while maintaining a smooth, logical flow for visitors.
The system should provide suggestions for areas where new cultural landmarks, public artworks, or monuments could be developed to encourage visitor traffic in underutilised spaces.
The final solution should have a user-friendly interface for tourists, displaying routes, landmarks, and transport stops in a clear and interactive map format.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import folium
from folium.plugins import MarkerCluster
🚂 Train Routes
Train Routes¶
This dataset contains information about selected public transport routes in Victoria, Australia, specifically focusing on metropolitan train and tram services. Each record includes key details such as:
- route_id: A unique identifier for each route.
- agency_id: The identifier for the transport agency operating the route.
- route_short_name: A short name used to represent the route (e.g., line name).
- route_long_name: A longer description of the route, usually indicating its endpoints.
- route_type: A numerical code representing the type of transport (e.g., 2 for rail services).
- route_color: The designated colour used to visually represent the route on maps or signage (in hexadecimal format).
- route_text_color: The colour of text displayed over the route colour for readability.
metro_train_routes = pd.read_csv("Datasets/gtfs/Metro Train/routes.txt", delimiter=",")
# Split based on 'aus:vic:vic-' and take the second part
metro_train_routes['train_id'] = metro_train_routes['route_id'].str.extract(r'aus:vic:vic-(.*?):?$', expand=False)
metro_train_routes = metro_train_routes[['train_id', 'route_short_name', 'route_long_name']]
metro_train_routes = metro_train_routes.drop_duplicates()
metro_train_routes.head()
| train_id | route_short_name | route_long_name | |
|---|---|---|---|
| 0 | 02-ALM | Alamein | Alamein - City |
| 1 | 02-BEG | Belgrave | Belgrave - City |
| 2 | 02-CBE | Cranbourne | Cranbourne - City |
| 3 | 02-CCL | City Circle | NaN |
| 4 | 02-CGB | Craigieburn | Craigieburn - City |
Train Stops¶
All location data is provided in decimal degrees, which supports integration with mapping tools and geographic information systems (GIS).
metro_train_stops = pd.read_csv("Datasets/gtfs/Metro Train/stops.txt", delimiter=",")
metro_train_stops = metro_train_stops[['stop_id', 'stop_name', 'stop_lat','stop_lon']]
metro_train_stops = metro_train_stops.drop_duplicates()
metro_train_stops['stop_id'] = metro_train_stops['stop_id'].astype(str).str.strip()
metro_train_stops.head()
| stop_id | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|
| 0 | 10117 | Jordanville Station | -37.873763 | 145.112473 |
| 1 | 10920 | Flagstaff Station | -37.811880 | 144.956043 |
| 2 | 10921 | Flagstaff Station | -37.811725 | 144.955968 |
| 3 | 10922 | Melbourne Central Station | -37.809974 | 144.962547 |
| 4 | 10923 | Melbourne Central Station | -37.809865 | 144.962516 |
Train Times¶
metro_train_times = pd.read_csv("Datasets/gtfs/Metro Train/stop_times.txt", delimiter=",", dtype={'stop_headsign': str})
metro_train_times['train_id'] = metro_train_times['trip_id'].str.extract(r'(^[^-]+-[^-]+)')
metro_train_times = metro_train_times[['trip_id', 'train_id', 'stop_id', 'stop_sequence']]
metro_train_times = metro_train_times.drop_duplicates()
metro_train_times['stop_id'] = metro_train_times['stop_id'].astype(str).str.strip()
metro_train_times.head()
| trip_id | train_id | stop_id | stop_sequence | |
|---|---|---|---|---|
| 0 | 02-ALM--16-T2-2302 | 02-ALM | 11197 | 1 |
| 1 | 02-ALM--16-T2-2302 | 02-ALM | 11198 | 2 |
| 2 | 02-ALM--16-T2-2302 | 02-ALM | 11200 | 3 |
| 3 | 02-ALM--16-T2-2302 | 02-ALM | 11202 | 4 |
| 4 | 02-ALM--16-T2-2302 | 02-ALM | 11203 | 5 |
Trip Ids with Highest Stop Count¶
# Find the highest stop_sequence for each train_id
highest_seq_per_train = metro_train_times.loc[
metro_train_times.groupby('train_id')['stop_sequence'].idxmax(),
['train_id', 'trip_id', 'stop_sequence']
].rename(columns={'stop_sequence': 'max_sequence'})
# Get unique trip_ids
train_unique_trip_ids = highest_seq_per_train['trip_id'].unique()
# Filter metro_train_times for those trip_ids
filtered_metro_train_times = metro_train_times[metro_train_times['trip_id'].isin(train_unique_trip_ids)]
filtered_metro_train_times.head(5)
| trip_id | train_id | stop_id | stop_sequence | |
|---|---|---|---|---|
| 2539 | 02-ALM--16-T5-2801 | 02-ALM | 11213 | 1 |
| 2540 | 02-ALM--16-T5-2801 | 02-ALM | 22189 | 2 |
| 2541 | 02-ALM--16-T5-2801 | 02-ALM | 12196 | 3 |
| 2542 | 02-ALM--16-T5-2801 | 02-ALM | 12198 | 4 |
| 2543 | 02-ALM--16-T5-2801 | 02-ALM | 12200 | 5 |
Final Train Stops and Routes Dataset¶
filtered_metro_train_times = filtered_metro_train_times.copy()
# Now it's safe to modify
filtered_metro_train_times['stop_id'] = filtered_metro_train_times['stop_id'].astype(str).str.strip()
metro_train_stops['stop_id'] = metro_train_stops['stop_id'].astype(str).str.strip()
result_train1 = filtered_metro_train_times.merge(metro_train_stops, on='stop_id', how='left')
result_train1.head(2)
| trip_id | train_id | stop_id | stop_sequence | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|
| 0 | 02-ALM--16-T5-2801 | 02-ALM | 11213 | 1 | Flinders Street Station | -37.818307 | 144.966010 |
| 1 | 02-ALM--16-T5-2801 | 02-ALM | 22189 | 2 | Southern Cross Station | -37.818535 | 144.952144 |
result_train = result_train1.merge(metro_train_routes, on='train_id', how='left')
result_train = result_train[['trip_id', 'train_id', 'route_short_name', 'stop_id','stop_name', 'stop_lat', 'stop_lon']]
result_train.head(2)
| trip_id | train_id | route_short_name | stop_id | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|
| 0 | 02-ALM--16-T5-2801 | 02-ALM | Alamein | 11213 | Flinders Street Station | -37.818307 | 144.966010 |
| 1 | 02-ALM--16-T5-2801 | 02-ALM | Alamein | 22189 | Southern Cross Station | -37.818535 | 144.952144 |
🚌 Bus Routes
Bus Routes¶
The route_long_name field provides a clear description of the route’s start and end points, aiding in trip planning and wayfinding. The route_type value of 3 indicates these are bus services, and the consistent colour scheme supports visual uniformity in digital and printed transport maps.
metro_bus_routes = pd.read_csv("Datasets/gtfs/Metro Bus/routes.txt", delimiter=",")
metro_bus_routes = metro_bus_routes[['route_short_name', 'route_long_name']]
metro_bus_routes.head(2)
| route_short_name | route_long_name | |
|---|---|---|
| 0 | 831 | Kingsmere Estate - Berwick Station |
| 1 | 834 | Berwick Station |
Bus Stops¶
The data enables precise mapping of stop locations, supporting route planning, accessibility assessments, and integration with broader transport datasets. It is especially useful for visualising public transport coverage and identifying connectivity within and between suburbs.
Stop names are descriptive and commonly formatted as Street/Street (Suburb), providing clarity for users navigating the transport network.
metro_bus_stops = pd.read_csv("Datasets/gtfs/Metro Bus/stops.txt", delimiter=",")
metro_bus_stops.head(2)
| stop_id | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|
| 0 | 1000 | Dole Ave/Cheddar Rd (Reservoir) | -37.700775 | 145.018951 |
| 1 | 10001 | Rex St/Taylors Rd (Kings Park) | -37.726975 | 144.776152 |
Bus Stop Times¶
trip_id. It includes the scheduled arrival and departure times, stop ID, the order of the stop in the route (stop_sequence), and the distance travelled along the route (shape_dist_traveled, in metres).The pickup_type and drop_off_type columns specify how passengers can board or alight at each stop, with values indicating standard pickup and drop-off procedures. This data is vital for constructing accurate transport schedules, simulating travel behaviour, and enhancing the operational efficiency of public transport services.
metro_bus_times = pd.read_csv("Datasets/gtfs/Metro Bus/stop_times.txt", delimiter=",")
metro_bus_times = metro_bus_times[['trip_id', 'stop_id', 'stop_sequence']]
metro_bus_times['bus_number'] = metro_bus_times['trip_id'].str.split('-').str[1]
metro_bus_times.head()
| trip_id | stop_id | stop_sequence | bus_number | |
|---|---|---|---|---|
| 0 | 43-477--1-MF1-1086914 | 6725 | 1 | 477 |
| 1 | 43-477--1-MF1-1086914 | 6726 | 2 | 477 |
| 2 | 43-477--1-MF1-1086914 | 9095 | 3 | 477 |
| 3 | 43-477--1-MF1-1086914 | 27586 | 4 | 477 |
| 4 | 43-477--1-MF1-1086914 | 27587 | 5 | 477 |
Trip Ids with Highest Stop Count¶
# Find the highest stop_sequence for each train_id
highest_seq_per_bus = metro_bus_times.loc[
metro_bus_times.groupby('bus_number')['stop_sequence'].idxmax(),
['bus_number', 'trip_id', 'stop_sequence']
].rename(columns={'stop_sequence': 'max_sequence'})
# Get unique trip_ids
bus_unique_trip_ids = highest_seq_per_bus['trip_id'].unique()
# Filter metro_train_times for those trip_ids
filtered_metro_bus_times = metro_bus_times[metro_bus_times['trip_id'].isin(bus_unique_trip_ids)]
filtered_metro_bus_times.head(5)
| trip_id | stop_id | stop_sequence | bus_number | |
|---|---|---|---|---|
| 2617 | 43-477--1-MF1-1091914 | 18850 | 1 | 477 |
| 2618 | 43-477--1-MF1-1091914 | 7253 | 2 | 477 |
| 2619 | 43-477--1-MF1-1091914 | 7254 | 3 | 477 |
| 2620 | 43-477--1-MF1-1091914 | 7255 | 4 | 477 |
| 2621 | 43-477--1-MF1-1091914 | 18772 | 5 | 477 |
Final Bus Stops and Routes Dataset¶
filtered_metro_bus_times = filtered_metro_bus_times.copy()
# Trim spaces and convert stop_id to string for consistency
filtered_metro_bus_times['stop_id'] = filtered_metro_bus_times['stop_id'].astype(str).str.strip()
metro_bus_stops['stop_id'] = metro_bus_stops['stop_id'].astype(str).str.strip()
result_bus1 = filtered_metro_bus_times.merge(metro_bus_stops, on='stop_id', how='left')
result_bus1.head(2)
| trip_id | stop_id | stop_sequence | bus_number | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|
| 0 | 43-477--1-MF1-1091914 | 18850 | 1 | 477 | Moonee Ponds Interchange/Mt Alexander Rd (Moon... | -37.766260 | 144.924447 |
| 1 | 43-477--1-MF1-1091914 | 7253 | 2 | 477 | Park St/Mt Alexander Rd (Moonee Ponds) | -37.761882 | 144.921515 |
result_bus = pd.merge(result_bus1, metro_bus_routes, how='left', left_on='bus_number', right_on='route_short_name')
result_bus = result_bus[['trip_id', 'stop_id', 'stop_sequence', 'bus_number', 'route_long_name', 'stop_name', 'stop_lat', 'stop_lon']]
result_bus.head(2)
| trip_id | stop_id | stop_sequence | bus_number | route_long_name | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|---|
| 0 | 43-477--1-MF1-1091914 | 18850 | 1 | 477 | Broadmeadows Station - Moonee Ponds | Moonee Ponds Interchange/Mt Alexander Rd (Moon... | -37.766260 | 144.924447 |
| 1 | 43-477--1-MF1-1091914 | 7253 | 2 | 477 | Broadmeadows Station - Moonee Ponds | Park St/Mt Alexander Rd (Moonee Ponds) | -37.761882 | 144.921515 |
🚃 Tram Routes
Tram Routes¶
route_id (a unique identifier), agency_id (the transport agency operating the service), route_short_name (the tram number), and route_long_name (the full start-to-end route description).The route_type is indicated as “0”, which represents tram services in accordance with GTFS (General Transit Feed Specification) standards. Additionally, each route is styled with a route_color and route_text_color to support visual clarity in mapping and user interface applications.
metro_tram_routes = pd.read_csv("Datasets/gtfs/Metro Tram/routes.txt", delimiter=",")
metro_tram_routes = metro_tram_routes[['route_id', 'route_short_name', 'route_long_name']]
metro_tram_routes.head(2)
| route_id | route_short_name | route_long_name | |
|---|---|---|---|
| 0 | aus:vic:vic-03-1: | 1 | South Melbourne Beach - East Coburg |
| 1 | aus:vic:vic-03-109: | 109 | Port Melbourne - Box Hill |
Tram Stops¶
stop_id, the stop_name (typically indicating the intersecting streets and suburb), and geographic coordinates (stop_lat and stop_lon) for mapping purposes.The dataset is useful for identifying the exact location of tram stops, and can be integrated with route and trip data for route planning, navigation systems, and urban mobility analysis.
metro_tram_stops = pd.read_csv("Datasets/gtfs/Metro Tram/stops.txt", delimiter=",")
metro_tram_stops.head(2)
| stop_id | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|
| 0 | 10311 | 45-Glenferrie Rd/Wattletree Rd (Malvern) | -37.862455 | 145.028508 |
| 1 | 10371 | 44-Duncraig Ave/Wattletree Rd (Armadale) | -37.862069 | 145.025382 |
Tram Stop Times¶
trip_id and includes details such as arrival_time, departure_time, the associated stop_id, the stop's sequence in the route (stop_sequence), and the cumulative shape_dist_traveled (in metres) from the start of the trip.metro_tram_times = pd.read_csv("Datasets/gtfs/Metro Tram/stop_times.txt", delimiter=",")
metro_tram_times = metro_tram_times[['trip_id', 'stop_id', 'stop_sequence']]
metro_tram_times['tram_number'] = metro_tram_times['trip_id'].str.split('-').str[1]
metro_tram_times.head()
| trip_id | stop_id | stop_sequence | tram_number | |
|---|---|---|---|---|
| 0 | 03-109--1-T2-129962370 | 19781 | 1 | 109 |
| 1 | 03-109--1-T2-129962370 | 19782 | 2 | 109 |
| 2 | 03-109--1-T2-129962370 | 19783 | 3 | 109 |
| 3 | 03-109--1-T2-129962370 | 19784 | 4 | 109 |
| 4 | 03-109--1-T2-129962370 | 19785 | 5 | 109 |
Trip Ids with Highest Stop Count¶
# Find the highest stop_sequence for each train_id
highest_seq_per_tram = metro_tram_times.loc[
metro_tram_times.groupby('tram_number')['stop_sequence'].idxmax(),
['tram_number', 'trip_id', 'stop_sequence']
].rename(columns={'stop_sequence': 'max_sequence'})
# Get unique trip_ids
tram_unique_trip_ids = highest_seq_per_tram['trip_id'].unique()
# Filter metro_train_times for those trip_ids
filtered_metro_tram_times = metro_tram_times[metro_tram_times['trip_id'].isin(tram_unique_trip_ids)]
filtered_metro_tram_times.head(5)
| trip_id | stop_id | stop_sequence | tram_number | |
|---|---|---|---|---|
| 5602 | 03-109--1-T2-129963278 | 19725 | 1 | 109 |
| 5603 | 03-109--1-T2-129963278 | 19372 | 2 | 109 |
| 5604 | 03-109--1-T2-129963278 | 19371 | 3 | 109 |
| 5605 | 03-109--1-T2-129963278 | 19370 | 4 | 109 |
| 5606 | 03-109--1-T2-129963278 | 19369 | 5 | 109 |
Final Tram Stops and Routes Dataset¶
filtered_metro_tram_times = filtered_metro_tram_times.copy()
# Trim spaces and convert stop_id to string for consistency
filtered_metro_tram_times['stop_id'] = filtered_metro_tram_times['stop_id'].astype(str).str.strip()
metro_tram_stops['stop_id'] = metro_tram_stops['stop_id'].astype(str).str.strip()
result_tram1 = filtered_metro_tram_times.merge(metro_tram_stops, on='stop_id', how='left')
result_tram1.head(2)
| trip_id | stop_id | stop_sequence | tram_number | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|
| 0 | 03-109--1-T2-129963278 | 19725 | 1 | 109 | 129-Beacon Cove/Light Rail (Port Melbourne) | -37.840789 | 144.932813 |
| 1 | 03-109--1-T2-129963278 | 19372 | 2 | 109 | 128-Graham St/Light Rail (Port Melbourne) | -37.837054 | 144.937190 |
result_tram1['tram_number'] = result_tram1['tram_number'].astype('int64')
result_tram = pd.merge(result_tram1, metro_tram_routes, how='left', left_on='tram_number', right_on='route_short_name')
result_tram = result_tram[['trip_id', 'stop_id', 'stop_sequence', 'tram_number', 'route_long_name', 'stop_name', 'stop_lat', 'stop_lon']]
result_tram.head(2)
| trip_id | stop_id | stop_sequence | tram_number | route_long_name | stop_name | stop_lat | stop_lon | |
|---|---|---|---|---|---|---|---|---|
| 0 | 03-109--1-T2-129963278 | 19725 | 1 | 109 | Port Melbourne - Box Hill | 129-Beacon Cove/Light Rail (Port Melbourne) | -37.840789 | 144.932813 |
| 1 | 03-109--1-T2-129963278 | 19372 | 2 | 109 | Port Melbourne - Box Hill | 128-Graham St/Light Rail (Port Melbourne) | -37.837054 | 144.937190 |
🚶🏻♂️Pedestrians
This dataset captures hourly pedestrian counts at various sensor locations across Melbourne. Each record includes a unique
ID, the Location_ID of the sensor, the Sensing_Date, and the HourDay representing the hour of observation. Pedestrian flow is divided into Direction_1 and Direction_2, with their sum recorded as Total_of_Directions. Additional fields such as Sensor_Name and Location (latitude, longitude) help identify where the sensor is positioned.Import and Clean Pedestrians dataset¶
ped_counts = pd.read_csv("Datasets/pedestrian-counting-system-monthly-counts-per-hour.csv")
ped_counts.head(2)
| ID | Location_ID | Sensing_Date | HourDay | Direction_1 | Direction_2 | Total_of_Directions | Sensor_Name | Location | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 371420221110 | 37 | 2022-11-10 | 14 | 77 | 90 | 167 | Lyg260_T | -37.80107122, 144.96704554 |
| 1 | 521220230401 | 52 | 2023-04-01 | 12 | 335 | 321 | 656 | Eli263_T | -37.81252157, 144.9619401 |
# Split 'Location' into separate Latitude and Longitude
ped_counts[['Latitude', 'Longitude']] = ped_counts['Location'].str.split(',', expand=True)
# Create an explicit copy of the selected columns to avoid SettingWithCopyWarning
date_counts = ped_counts[['Location_ID', 'Sensing_Date', 'Total_of_Directions', 'Sensor_Name', 'Latitude', 'Longitude']].copy()
# Convert Latitude and Longitude to float
date_counts['Latitude'] = date_counts['Latitude'].astype(float)
date_counts['Longitude'] = date_counts['Longitude'].astype(float)
# Display the new DataFrame
date_counts.head(5)
| Location_ID | Sensing_Date | Total_of_Directions | Sensor_Name | Latitude | Longitude | |
|---|---|---|---|---|---|---|
| 0 | 37 | 2022-11-10 | 167 | Lyg260_T | -37.801071 | 144.967046 |
| 1 | 52 | 2023-04-01 | 656 | Eli263_T | -37.812522 | 144.961940 |
| 2 | 84 | 2022-03-30 | 1611 | ElFi_T | -37.817980 | 144.965034 |
| 3 | 54 | 2023-09-28 | 332 | Swa607_T | -37.804024 | 144.963084 |
| 4 | 61 | 2022-01-03 | 284 | RMIT14_T | -37.807675 | 144.963091 |
# Group by 'Sensor_Name' and sum 'Total_of_Directions'
sensor_count = date_counts.groupby(['Location_ID', 'Sensor_Name', 'Latitude', 'Longitude'], as_index=False)['Total_of_Directions'].sum()
# Display the result
sensor_count.head()
| Location_ID | Sensor_Name | Latitude | Longitude | Total_of_Directions | |
|---|---|---|---|---|---|
| 0 | 1 | Bou292_T | -37.813494 | 144.965153 | 17472047 |
| 1 | 2 | Bou283_T | -37.813807 | 144.965167 | 9253528 |
| 2 | 3 | Swa295_T | -37.811015 | 144.964295 | 21024227 |
| 3 | 4 | Swa123_T | -37.814880 | 144.966088 | 23849378 |
| 4 | 5 | PriNW_T | -37.818742 | 144.967877 | 18475133 |
Exploratory Data Analysis For Pedestrian Data¶
This interactive map visualises pedestrian traffic data across Melbourne using
folium and MarkerCluster for efficient rendering. The base map is centred on the average coordinates of all sensor locations. Each sensor is represented by a circular bubble marker, with the size of the bubble scaled based on the total number of pedestrians recorded in both directions (Total_of_Directions).Larger bubbles indicate higher pedestrian counts, allowing for quick identification of high-footfall areas. The markers include popups that display detailed information, such as the sensor name and total pedestrian count, enhancing usability for exploration and analysis.
# Create a base map centered on the average coordinates (Melbourne)
m = folium.Map(location=[sensor_count['Latitude'].mean(), sensor_count['Longitude'].mean()], zoom_start=15)
# Initialize MarkerCluster for better performance when there are many markers
marker_cluster = MarkerCluster().add_to(m)
# Add bubble markers with size based on the count
for _, row in sensor_count.iterrows():
# Set the bubble size directly based on the count
bubble_size = row['Total_of_Directions'] / 500000 # Divide by 1 million for readability
folium.CircleMarker(
location=[row['Latitude'], row['Longitude']],
radius=bubble_size, # Size based directly on count
color="blue", # Color can be dynamic based on intensity
fill=True,
fill_opacity=0.6,
fill_color="blue", # You can adjust this to a gradient for more color intensity
popup=f"<b>Sensor Name:</b> {row['Sensor_Name']}<br><b>Total of Directions:</b> {row['Total_of_Directions']}"
).add_to(marker_cluster)
# Display the map
m
🧑🏻🎨 Public Artworks, Fountains and Monuments
This dataset provides information about public artworks, fountains and monuments located across the City of Melbourne. Each entry includes details such as the artwork's name, artist (where known), year of creation, material or structure type, and specific address or location. Coordinates are provided in both geographic (latitude and longitude) and projected (Easting and Northing) formats, allowing for spatial analysis and visualisation.
Additional context includes alternate names, authorship, and original data sources such as aerial imagery or field surveys.
Import Artworks, Fountains and Monuments dataset¶
places = pd.read_csv("Datasets/public-artworks-fountains-and-monuments.csv")
places.head(2)
| Asset Type | Name | Xorg | Xsource | Address Point | Artist | Alternate Name | Art Date | Mel way Ref | Respective Author | Structure | Co-ordinates | Easting | Northing | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Art | Port Phillip Monument | City of Melbourne | MCC - Ortho Image March 2005 - Final | 178 Sims Street, WEST MELBOURNE | unknown | NaN | 1941 | 2S_K11 | City Of Melbourne | Basalt monument | -37.8056957854241, 144.907291041632 | 315771.745 | 5813680.208 |
| 1 | Art | Bird Panels | City of Melbourne | MCC - Ortho Image March 2005 - Final | 76 Canning Street | Di Christensen and Bernice McPherson | NaN | 1995 | 2A_E5 | City Of Melbourne | Stainless-steel panels | -37.7953526839703, 144.940687314302 | 318686.757 | 5814893.278 |
Exploratory Data Analysis For Artworks, Fountains and Monuments¶
Pie Chart: Different Types of Assests¶
The dataset provides the frequency count of each asset type, which is visualised using a pie chart. The chart represents the distribution of various asset types (such as public artworks, monuments, and fountains) in the City of Melbourne. Each asset type's proportion is displayed in percentages, making it easy to assess the relative abundance of each type.
Distribution of Asset Types:
- Artworks: 74.3%
- Monuments: 18%
- Fountains: 7.7%
# Count frequency of each Asset Type
asset_counts = places['Asset Type'].value_counts()
# Plot pie chart
plt.figure(figsize=(8, 8))
plt.pie(asset_counts, labels=asset_counts.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired.colors)
plt.title('Distribution of Asset Types')
plt.axis('equal') # Equal aspect ratio ensures the pie chart is circular.
plt.show()
Bar Chart: Amount of Artworks Maneged by Different Organizations¶
The dataset represents the frequency count of artworks by the organisation (denoted by the 'Xorg' column) in the City of Melbourne. The data is visualised through a bar chart, which illustrates the number of artworks associated with each organisation. The bars show the distribution of artworks across various organisations, making it easier to see which organisations have the most public art installations.
The x-axis represents the organisations, while the y-axis indicates the number of artworks contributed by each.
Number of Artworks by Organisation (Xorg):
- City of Melbourne: 199 artworks
- VicUrban: 38 artwork
- Beveridge Williams Surveyors: 15 artwork
# Count frequency of each Xorg
xorg_counts = places['Xorg'].value_counts()
# Plot bar chart
plt.figure(figsize=(10, 6))
ax = xorg_counts.plot(kind='bar', color='skyblue', edgecolor='black')
# Adding text on top of each bar to show the count
for i in ax.patches:
ax.annotate(f'{i.get_height()}',
(i.get_x() + i.get_width() / 2, i.get_height()),
xytext=(0, 5),
textcoords='offset points',
ha='center',
va='bottom',
fontsize=10,
color='black')
plt.title('Number of Artworks by Xorg')
plt.xlabel('Xorg')
plt.ylabel('Number of Artworks')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
Word Cloud: Most Common Artists¶
# Drop NaN values from 'Artist' column
artists = places['Artist'].dropna()
# Combine all artist names into a single string
artist_text = " ".join(artists)
# Generate the word cloud
wordcloud = WordCloud(width=800, height=400, background_color='white', colormap='viridis').generate(artist_text)
# Plot the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Artists')
plt.tight_layout()
plt.show()
Map: Map Visualization of each Artworks, Fountains and Monuments¶
This interactive map displays the locations of various public artworks, monuments, sculptures, and panels throughout Melbourne. Each marker on the map represents an asset and is colour-coded based on its type:
- Blue markers represent artworks.
- Green markers represent monuments.
- Purple markers represent sculptures.
- Orange markers represent panels.
By clicking on a marker, a detailed tooltip appears, providing information about the asset, including its type, name, associated organisation, artist, and the year it was created. This map serves as an informative guide to the diverse range of public art in Melbourne, offering users the opportunity to explore and learn more about these significant cultural lndmarks.
# Convert 'Co-ordinates' column to separate latitude and longitude
places[['Latitude', 'Longitude']] = places['Co-ordinates'].str.split(',', expand=True)
places['Latitude'] = places['Latitude'].astype(float)
places['Longitude'] = places['Longitude'].astype(float)
# Define color mapping for different Asset Types
asset_colors = {
"Art": "blue",
"Monument": "green",
"Sculpture": "purple",
"Panel": "orange"
}
# Create a base map centered on Melbourne
m = folium.Map(location=[-37.81, 144.96], zoom_start=13)
# Add markers with detailed tooltip
for _, row in places.iterrows():
asset_type = row['Asset Type']
color = asset_colors.get(asset_type, "gray") # Default color if type is missing
# Construct the tooltip with bold labels
tooltip = f"""
<b>Asset Type:</b> {row['Asset Type']}<br>
<b>Name:</b> {row['Name']}<br>
<b>Organization:</b> {row['Xorg']}<br>
<b>Artist:</b> {row['Artist']}<br>
<b>Year:</b> {row['Art Date']}
"""
folium.Marker(
location=[row['Latitude'], row['Longitude']],
popup=row['Name'],
tooltip=tooltip,
icon=folium.Icon(color=color)
).add_to(m)
# Display the map
m
Mapping the Train Routes
Below we visualize Melbourne's train routes on an interactive map. It processes data stored in the result_train DataFrame, where each train route (train_id) is uniquely colored using visually distinct palettes.
import folium
import pandas as pd
from branca.element import Template, MacroElement
import matplotlib.colors as mcolors
import random
# Create map centered on average coordinates
map_center = [result_train['stop_lat'].mean(), result_train['stop_lon'].mean()]
m = folium.Map(location=map_center, zoom_start=12, tiles='CartoDB positron')
# Generate distinct colors for each train_id
def generate_distinct_colors(n):
"""Generate n visually distinct colors"""
colors = []
# Start with a set of good distinct colors
base_colors = list(mcolors.TABLEAU_COLORS.values()) # Tableau palette
base_colors.extend(['#FF00FF', '#00FFFF', '#FFA500', '#800080', '#008080'])
if n <= len(base_colors):
return base_colors[:n]
# If we need more colors than we have base colors, generate random but distinct ones
for _ in range(n - len(base_colors)):
# Generate random but reasonably distinct colors
h = random.random()
s = 0.7 + random.random() * 0.3
v = 0.6 + random.random() * 0.3
rgb = mcolors.hsv_to_rgb([h, s, v])
colors.append(mcolors.to_hex(rgb))
return base_colors + colors
# Get unique train_ids and assign colors
unique_train_ids = result_train['train_id'].unique()
colors = generate_distinct_colors(len(unique_train_ids))
color_dict = dict(zip(unique_train_ids, colors))
# Add each train route separately
for train_id, group in result_train.groupby('train_id'):
color = color_dict[train_id]
# Create feature group for this route
route_name = group['route_short_name'].iloc[0] if 'route_short_name' in group.columns else train_id
fg = folium.FeatureGroup(name=f"{route_name} (Train ID: {train_id})")
# Add line for the route
line = folium.PolyLine(
locations=group[['stop_lat', 'stop_lon']].values,
color=color,
weight=5,
opacity=0.7,
tooltip=f"<b>{route_name}</b><br>Train ID: {train_id}"
)
fg.add_child(line)
# Add markers for each stop
for idx, row in group.iterrows():
stop_name = row['stop_name'] if 'stop_name' in row else f"Stop {row['stop_id']}"
stop_id = row['stop_id'] if 'stop_id' in row else "N/A"
marker = folium.CircleMarker(
location=[row['stop_lat'], row['stop_lon']],
radius=6,
color=color,
fill=True,
fill_color=color,
fill_opacity=1,
tooltip=f"""
<div style='width: 200px'>
<b>Route:</b> {route_name}<br>
<b>Train ID:</b> {train_id}<br>
<b>Stop:</b> {stop_name}<br>
<b>Stop ID:</b> {stop_id}
</div>
"""
)
fg.add_child(marker)
m.add_child(fg)
# Add layer control to toggle routes
folium.LayerControl().add_to(m)
# Custom CSS template for better tooltips
template = """
{% macro html(this, kwargs) %}
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Train Routes</title>
<style>
.folium-tooltip {
font-size: 14px;
line-height: 1.4;
max-width: 250px;
}
.folium-tooltip table {
border-collapse: collapse;
}
.folium-tooltip th, .folium-tooltip td {
padding: 2px 5px;
text-align: left;
}
.folium-tooltip hr {
margin: 5px 0;
border: 0;
border-top: 1px solid #eee;
}
</style>
</head>
</html>
{% endmacro %}
"""
macro = MacroElement()
macro._template = Template(template)
m.get_root().add_child(macro)
# Display the map
m
Mapping the Bus Routes
import folium
import pandas as pd
from branca.element import Template, MacroElement
import matplotlib.colors as mcolors
import random
# Create map centered on average coordinates
map_center = [result_bus['stop_lat'].mean(), result_bus['stop_lon'].mean()]
m = folium.Map(location=map_center, zoom_start=12, tiles='CartoDB positron')
# Generate distinct colors for each train_id
def generate_distinct_colors(n):
"""Generate n visually distinct colors"""
colors = []
# Start with a set of good distinct colors
base_colors = list(mcolors.TABLEAU_COLORS.values()) # Tableau palette
base_colors.extend(['#FF00FF', '#00FFFF', '#FFA500', '#800080', '#008080'])
if n <= len(base_colors):
return base_colors[:n]
# If we need more colors than we have base colors, generate random but distinct ones
for _ in range(n - len(base_colors)):
# Generate random but reasonably distinct colors
h = random.random()
s = 0.7 + random.random() * 0.3
v = 0.6 + random.random() * 0.3
rgb = mcolors.hsv_to_rgb([h, s, v])
colors.append(mcolors.to_hex(rgb))
return base_colors + colors
# Get unique train_ids and assign colors
unique_bus_numbers = result_bus['bus_number'].unique()
colors = generate_distinct_colors(len(unique_bus_numbers))
color_dict = dict(zip(unique_bus_numbers, colors))
# Add each train route separately
for bus_number, group in result_bus.groupby('bus_number'):
color = color_dict[bus_number]
# Create feature group for this route
route_name = group['route_long_name'].iloc[0] if 'route_long_name' in group.columns else bus_number
fg = folium.FeatureGroup(name=f"{route_name} (Bus Number: {bus_number})")
# Add line for the route
line = folium.PolyLine(
locations=group[['stop_lat', 'stop_lon']].values,
color=color,
weight=5,
opacity=0.7,
tooltip=f"<b>{route_name}</b><br>Bus Number: {bus_number}"
)
fg.add_child(line)
# Add markers for each stop
for idx, row in group.iterrows():
stop_name = row['stop_name'] if 'stop_name' in row else f"Stop {row['stop_id']}"
stop_id = row['stop_id'] if 'stop_id' in row else "N/A"
marker = folium.CircleMarker(
location=[row['stop_lat'], row['stop_lon']],
radius=6,
color=color,
fill=True,
fill_color=color,
fill_opacity=1,
tooltip=f"""
<div style='width: 200px'>
<b>Route:</b> {route_name}<br>
<b>Bus Number:</b> {bus_number}<br>
<b>Stop:</b> {stop_name}<br>
<b>Stop ID:</b> {stop_id}
</div>
"""
)
fg.add_child(marker)
m.add_child(fg)
# Add layer control to toggle routes
folium.LayerControl().add_to(m)
# Custom CSS template for better tooltips
template = """
{% macro html(this, kwargs) %}
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Train Routes</title>
<style>
.folium-tooltip {
font-size: 14px;
line-height: 1.4;
max-width: 250px;
}
.folium-tooltip table {
border-collapse: collapse;
}
.folium-tooltip th, .folium-tooltip td {
padding: 2px 5px;
text-align: left;
}
.folium-tooltip hr {
margin: 5px 0;
border: 0;
border-top: 1px solid #eee;
}
</style>
</head>
</html>
{% endmacro %}
"""
macro = MacroElement()
macro._template = Template(template)
m.get_root().add_child(macro)
# Display the map
m